Motivation For this Notebook

Recently we have observed that most of our fellow community members at Kaggle are developing very simple and similar graphs using the same old libraries, without any interesting approaches or any kind of interactivity (apart from bouncing labels ;) ). So, we decided to try some new and different kind of graphs among the vast number of libraries that are available online for data visualization. Hence we present this Kernel with some interesting graphs - DrillDown Charts, Network Plots & Motion Plots. Constructive criticism will be appreciated. Please upvote our work! Your support will motivate us to try more cool stuff and bring it to the community.


A. Developer Profile Analysis


This section deals with the analysis of the most important aspect of Stack Overflow - its community. It shows a detailed analysis of the various metrics that were explored during the EDA. We will take a detailed look at the Country, Education, Non-Degree Education Sources, Occupations, Job Experience, Demographics (such as Gender, Age) and the overall satisfaction of the users with Stack Overflow and its community.

1. What is their Country?

1.1 All Respondents by Country

data(worldgeojson, package = "highcharter")

by_country <- survey_results_public %>% select(Country) %>% filter(!is.na(Country)) %>%group_by(Country) %>% summarise(n1=n())
code <- countrycode(by_country$Country, 'country.name', 'iso3c')
by_country$iso3 <- code


p_by_country <- highchart() %>% 
                  hc_add_series_map(worldgeojson, by_country, value = "n1", joinBy = "iso3") %>% 
                  hc_colorAxis(stops = color_stops()) %>% 
                  hc_legend(enabled = TRUE) %>%  
                  hc_mapNavigation(enabled = TRUE) %>%
                  hc_title(text = "Respondent by Country")  %>%
                  hc_tooltip(useHTML = TRUE, headerFormat = "",
                            pointFormat = "Country: {point.Country} Total Respondent: {point.n1}") %>%  hc_add_theme(hc_theme_google())


professionals_result <- survey_results_public %>% filter(Student=="No")

professionals_by_country <- professionals_result %>% select(Country) %>% filter(!is.na(Country)) %>%group_by(Country) %>% summarise(n2=n())
code <- countrycode(professionals_by_country$Country, 'country.name', 'iso3c')
professionals_by_country$iso3 <- code

combined_result <- by_country %>% left_join(professionals_by_country, by="iso3") %>% select(iso3, Country.x, n1, n2)
names(combined_result) <- c("iso3", "Country", "n1", "n2")

data(worldgeojson, package = "highcharter")

p_professionals_by_country <- highchart() %>% 
                              hc_add_series_map(worldgeojson, combined_result, value = "n2", joinBy = "iso3") %>% 
                              hc_colorAxis(stops = color_stops()) %>% 
                              hc_legend(enabled = TRUE) %>% 
                              #hc_add_theme(hc_theme_google()) %>% 
                              hc_mapNavigation(enabled = TRUE) %>%
                              hc_title(text = "Professionals Respondent by Countries")  %>%
                              hc_tooltip(useHTML = TRUE, headerFormat = "",
                                        pointFormat = "Country: {point.Country} Professionals: {point.n2} Total Respondent: {point.n1}") %>% hc_add_theme(hc_theme_google())

lst <- list(
  p_by_country,
  p_professionals_by_country
)

hw_grid(lst, rowheight = 350)

Insight

  • The first graph gives the number of respondents that belong to each country. It can be observed that the top 3 highest number of respondents from various countries are:
    1. United States : 20309
    2. India : 13721
    3. Germany : 6459
  • The second graph gives the number of professional respondents that belong to each country. It can be observed that the top 3 highest number of respondents who are ‘Professional’ from various countries are:
    1. United States : 17220
    2. India : 8228
    3. United Kingdom : 5354

2 What is their Education?

2.1 Are they Student?

by_Student <- survey_results_public %>%
                        filter(!is.na(Student)) %>%
                        group_by(Student) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(Student = reorder(Student,Total)) %>%
                        mutate(Percent = round(Total/sum(Total)*100)) %>%
                        head(10)

highchart() %>%
  hc_xAxis(categories = by_Student$Student) %>% 
  hc_add_series(name = "Percent %", data = by_Student$Percent, colorByPoint =  1) %>% 
  hc_title(text = "Are Respondent Student")  %>%
  hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

Insight

  • It can be observed that 74% are not students while the remaining 26% are students.

2.2 Formal Education

by_FormalEducation <- survey_results_public %>%
                        filter(!is.na(FormalEducation)) %>%
                        group_by(FormalEducation) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(FormalEducation = reorder(FormalEducation,Total)) %>%
                        mutate(Percent = round(Total/sum(Total)*100)) %>%
                        head(10)

p_by_FormalEducation <- highchart() %>%
                          hc_xAxis(categories = by_FormalEducation$FormalEducation) %>% 
                          hc_add_series(name = "Percent %", data = by_FormalEducation$Percent, colorByPoint =  1) %>% 
                          hc_title(text = "Formal Education of Respondent")  %>%
                          hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())


by_UndergradMajor <- survey_results_public %>%
                        filter(!is.na(UndergradMajor)) %>%
                        group_by(UndergradMajor) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(UndergradMajor = reorder(UndergradMajor,Total)) %>%
                        mutate(Percent = round(Total/sum(Total)*100)) %>%
                        head(10)

p_by_UndergradMajor <-  highchart() %>%
                          hc_xAxis(categories = by_UndergradMajor$UndergradMajor) %>% 
                          hc_add_series(name = "Percent %", data = by_UndergradMajor$Percent, colorByPoint =  1) %>% 
                          hc_title(text = "Main Field of Study of Respondent")  %>%
                          hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())


lst <- list(
  p_by_FormalEducation,
  p_by_UndergradMajor
)

hw_grid(lst, rowheight = 400)

Insight

  • In first graph, it illustrates that majority of respondents are Bachelor’s Degree holders, followed by Master’s Degree holders & other college graduates.
  • In second graph, it illustrates that Computer Science, Computer Engineering or Software Engineering is the main field of study of respondents at 64%.

2.3 Non-Degree Education

by_EducationTypes <-    survey_results_public %>% 
                        select(Respondent,EducationTypes) %>% 
                        mutate(EducationTypes = strsplit(as.character(EducationTypes), ";")) %>% 
                        unnest(EducationTypes) %>%
                        filter(!is.na(EducationTypes)) %>%
                        group_by(EducationTypes) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(EducationTypes = reorder(EducationTypes,Total)) %>%
                        mutate(Percent = round(Total/nrow(survey_results_public)*100)) %>%
                        head(10)

p_by_EducationTypes <- highchart() %>%
                          hc_xAxis(categories = by_EducationTypes$EducationTypes) %>% 
                          hc_add_series(name = "Percent %", data = by_EducationTypes$Percent, colorByPoint =  1) %>% 
                          hc_title(text = "Non-Degree Education of Respondent")  %>%
                          hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

by_SelfTaughtTypes <-   survey_results_public %>% 
                        select(Respondent,SelfTaughtTypes) %>% 
                        mutate(SelfTaughtTypes = strsplit(as.character(SelfTaughtTypes), ";")) %>% 
                        unnest(SelfTaughtTypes) %>%
                        filter(!is.na(SelfTaughtTypes)) %>%
                        group_by(SelfTaughtTypes) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(SelfTaughtTypes = reorder(SelfTaughtTypes,Total)) %>%
                        mutate(Percent = round(Total/nrow(survey_results_public)*100)) %>%
                        head(10)

p_by_SelfTaughtTypes <- highchart() %>%
                          hc_xAxis(categories = by_SelfTaughtTypes$SelfTaughtTypes) %>% 
                          hc_add_series(name = "Percent %", data = by_SelfTaughtTypes$Percent, colorByPoint =  1) %>% 
                          hc_title(text = "Self Taught Types Education of Respondent")  %>%
                          hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

lst <- list(
  p_by_EducationTypes,
  p_by_SelfTaughtTypes
)

hw_grid(lst, rowheight = 400)

Insight

  • In first graph, Non-Degree Educated respondents have educated themselves in the various ways as illustrated. Most common sources of gaining knowledge are Self Learning at 60%, followed by Online Courses at 33% and Open Source Contribution at 28%.
  • In second graph, it further elaborate the Self Taught Respondents. Most common self teaching methodology includes reading Official Documentation at 48%, Q&A at SO at 48% and Books & E-Books at 29%.

2.4 Types of Non-Degree Education Vs Time to get a Full-time Job as a Developer

survey_results_public2 <-  survey_results_public %>%    mutate(EducationTypes = strsplit(as.character(EducationTypes), ";"))  %>%
                                                        unnest(EducationTypes)

df1 <- survey_results_public2 %>%
       filter(!is.na(EducationTypes)) %>%
       group_by(name = EducationTypes, drilldown = tolower(EducationTypes)) %>% 
       summarise(y = n()) %>% arrange(desc(y))

df2 <-survey_results_public2 %>% filter(!is.na(EducationTypes)) %>% filter(!is.na(TimeAfterBootcamp)) %>% group_by(EducationTypes,TimeAfterBootcamp) %>% dplyr::mutate(y = n(),colorByPoint =  1) %>%arrange(desc(y))%>%
  group_by(name = EducationTypes, id = tolower(EducationTypes),colorByPoint) %>% 
  do(data = list_parse(
                  mutate(.,name = TimeAfterBootcamp, drilldown = tolower(paste(EducationTypes,TimeAfterBootcamp,sep=": "))) %>% 
                      group_by(name,drilldown) %>% 
                        summarise(y=n())%>% dplyr::select(name, y, drilldown)   %>%
                            arrange(desc(y))) 
    )
    
highchart() %>% 
  hc_chart(type = "column") %>%
  hc_title(text = 'Types of Non-Degree Education Vs Time to get a Full-time Job as a Developer ') %>%
  hc_add_series(data = df1, name = "Types of Non-Degree Education",colorByPoint =  1) %>% 
  hc_legend(enabled = FALSE) %>%
  hc_xAxis(type = "category") %>% 
  hc_drilldown(
    allowPointDrilldown = TRUE,
    series =list_parse(df2)
  ) %>% hc_add_theme(hc_theme_google())

Insight

  • Note : It’s a Drill down graph.
  • This graph illustrates the amount of time taken by Respondents in getting a Full Time Job as a Developer after gaining knowledge from Non Degree Education sources.
  • On upper level it illustrates the number of repondents for each source. But after drill down it shows the number of respondents in different time intervals.
  • It has been observed that most repondents were ‘Already Full Time Developers’ in each of the knowledge source.

2.5 Network Analysis : Education Types

df <- survey_results_public %>% select(Respondent,EducationTypes)
df2 <- df %>% 
         mutate(EducationTypes = strsplit(as.character(EducationTypes), ";")) %>% 
         unnest(EducationTypes)
         
df2_edges <- df2 %>% group_by(Respondent) %>%
             filter(n()>=2) %>%
             do(data.frame(t(combn((.)[["EducationTypes"]], 2)), stringsAsFactors=FALSE)) %>% ungroup() %>%
             rename(source = X1, target = X2) %>%
             select(-Respondent)

df2_edges <- df2_edges %>% group_by(source,target) %>% summarise(weight=n())

names(df2_edges) <- c("from","to","weight")
df2_edges$weight <- df2_edges$weight/1500

df2_edges$width <- 1+df2_edges$weight # line width
df2_edges$color <- "gray"    # line color  
#df2_edges$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
df2_edges$smooth <- FALSE    # should the edges be curved?
df2_edges$shadow <- FALSE    # edge shadow

df2_nodes <- df2 %>% filter(!is.na(EducationTypes)) %>% group_by(EducationTypes) %>% summarise(n = n()/1000) %>% arrange(desc(n))
names(df2_nodes) <- c("id","size")

n <- nrow(df2_nodes)
palette <- distinctColorPalette(n)

df2_nodes$shape <- "dot"  
df2_nodes$shadow <- TRUE # Nodes will drop shadow
df2_nodes$title <- df2_nodes$id # Text on click
df2_nodes$label <- df2_nodes$id # Node label
df2_nodes$size <- df2_nodes$size # Node size
df2_nodes$borderWidth <- 2 # Node border width

df2_nodes$color.background <- palette[as.numeric(as.factor(df2_nodes$id))]
df2_nodes$color.border <- "black"
df2_nodes$color.highlight.background <- "orange"
df2_nodes$color.highlight.border <- "darkred"

df2_nodes <- df2 %>% filter(!is.na(EducationTypes)) %>% group_by(EducationTypes) %>% summarise(n = n()/1000) %>% arrange(desc(n))
names(df2_nodes) <- c("id","size")

n <- nrow(df2_nodes)
palette <- distinctColorPalette(n)

df2_nodes$shape <- "dot"  
df2_nodes$shadow <- TRUE # Nodes will drop shadow
df2_nodes$title <- df2_nodes$id # Text on click
df2_nodes$label <- df2_nodes$id # Node label
df2_nodes$size <- df2_nodes$size # Node size
df2_nodes$borderWidth <- 2 # Node border width

df2_nodes$color.background <- palette[as.numeric(as.factor(df2_nodes$id))]
df2_nodes$color.border <- "black"
df2_nodes$color.highlight.background <- "orange"
df2_nodes$color.highlight.border <- "darkred"

visNetwork(df2_nodes, df2_edges, height = "500px", width = "100%") %>% visIgraphLayout(layout = "layout_with_lgl") %>% 
  visEdges(shadow = TRUE,
           color = list(color = "gray", highlight = "orange"))

Insight

  • This is a Network Graph for the different Non Degree Education sources.
  • Each node denotes the different education source and the size of node denotes the number of respondents that used the particular source.
  • Each connecting edge between any two nodes denotes that the respondents chose both the education sources. And, the width of the edge denotes the number of users that chose both the education sources.
  • It can be observed that the top 5 highest correlation is between:
    1. ‘Taught yourself a new language…’ — ‘Taken an online course…’
    2. ‘Taught yourself a new language…’ — ‘Contributed to open source software’
    3. ‘Taught yourself a new language…’ — ‘Received on the job training in software development’
    4. ‘Taught yourself a new language…’ — ‘Participated in a hackathon’
    5. ‘Taken an online course…’ — ‘Contributed to open source software’
  • This means that all the people who chose the first option among the above also chose the second option.

2.6 Hackathon Reasons

by_HackathonReasons <-   survey_results_public %>% 
                        select(Respondent,HackathonReasons) %>% 
                        mutate(HackathonReasons = strsplit(as.character(HackathonReasons), ";")) %>% 
                        unnest(HackathonReasons) %>%
                        filter(!is.na(HackathonReasons)) %>%
                        group_by(HackathonReasons) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(HackathonReasons = reorder(HackathonReasons,Total)) %>%
                        mutate(Percent = round(Total/nrow(survey_results_public)*100)) %>%
                        head(10)

highchart() %>%
  hc_xAxis(categories = by_HackathonReasons$HackathonReasons) %>% 
  hc_add_series(name = "Percent %", data = by_HackathonReasons$Percent, colorByPoint =  1) %>% 
  hc_title(text = "Reasons For Participating in Hackathon")  %>%
  hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

Insight

  • This is a bar graph illustrating the various reassons given by the respondents for participating in the hackathons.
  • The top 3 most common reasons are:
    1. ‘Beacause I find it enjoyable’ : 20%
    2. ‘To improve my general technical skills or programming ability’ : 17%
    3. ‘To improve my knowledge of a specific programming language…’ : 13%

3. What is their Occupation?

3.1 DevType : Describe you

by_DevType <-   survey_results_public %>% 
                        select(Respondent,DevType) %>% 
                        mutate(DevType = strsplit(as.character(DevType), ";")) %>% 
                        unnest(DevType) %>%
                        filter(!is.na(DevType)) %>%
                        group_by(DevType) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(DevType = reorder(DevType,Total)) %>%
                        mutate(Percent = round(Total/nrow(survey_results_public)*100)) %>%
                        head(10)

highchart() %>%
  hc_xAxis(categories = by_DevType$DevType) %>% 
  hc_add_series(name = "Percent %", data = by_DevType$Percent, colorByPoint =  1) %>% 
  hc_title(text = "Developer Type")  %>%
  hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

Insight

  • This bar graph illustrates the different kind of developers.
  • There are 10 type of developers.
  • The top 3 most commonn types are:
    1. Back-end developer : 54%
    2. Full-stack developer : 45%
    3. Front-end developer : 35%
    4. Mobile developer : 19%
    5. Desktop or enterprise application developer : 16%

3.2 Network Analysis : DevType

df <- survey_results_public %>% select(Respondent,DevType)
df2 <- df %>% 
         mutate(DevType = strsplit(as.character(DevType), ";")) %>% 
         unnest(DevType)
         
df2_edges <- df2 %>% group_by(Respondent) %>%
             filter(n()>=2) %>%
             do(data.frame(t(combn((.)[["DevType"]], 2)), stringsAsFactors=FALSE)) %>% ungroup() %>%
             rename(source = X1, target = X2) %>%
             select(-Respondent)

df2_edges <- df2_edges %>% group_by(source,target) %>% summarise(weight=n())


names(df2_edges) <- c("from","to","weight")
df2_edges$weight <- df2_edges$weight/1500

df2_edges$width <- 1+df2_edges$weight # line width
df2_edges$color <- "gray"    # line color  
#df2_edges$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
df2_edges$smooth <- FALSE    # should the edges be curved?
df2_edges$shadow <- FALSE    # edge shadow

df2_nodes <- df2 %>% filter(!is.na(DevType)) %>% group_by(DevType) %>% summarise(n = n()/700) %>% arrange(desc(n))
names(df2_nodes) <- c("id","size")

n <- nrow(df2_nodes)
palette <- distinctColorPalette(n)

df2_nodes$shape <- "dot"  
df2_nodes$shadow <- TRUE # Nodes will drop shadow
df2_nodes$title <- df2_nodes$id # Text on click
df2_nodes$label <- df2_nodes$id # Node label
df2_nodes$size <- df2_nodes$size # Node size
df2_nodes$borderWidth <- 2 # Node border width

df2_nodes$color.background <- palette[as.numeric(as.factor(df2_nodes$id))]
df2_nodes$color.border <- "black"
df2_nodes$color.highlight.background <- "orange"
df2_nodes$color.highlight.border <- "darkred"

visNetwork(df2_nodes, df2_edges, height = "500px", width = "100%") %>% visIgraphLayout(layout = "layout_with_lgl") %>% 
  visEdges(shadow = TRUE,
           color = list(color = "gray", highlight = "orange"))

Insight

  • This is a network graph for the different types of developers.
  • Each node represents the type of developer and the size of node denotes the number of developers belonging to each type.
  • Each connecting edge between any two nodes denotes that the respondents chose both the developer types. The width of each edge denotes the nmber of respondents that chose both the developer types.
  • It can be observed that the top 5 highest correlation is between:
    1. Back end developer — Full stack developer
    2. Front end developer — Full stack developer
    3. Back end developer — Front end developer
    4. Full stack developer — Database administrator
    5. Desktop or enterprise application developer — Back end developer

3.3 Who are the Open Source project contributors?

survey_results_public2 <-  survey_results_public %>%    mutate(DevType = strsplit(as.character(DevType), ";"))  %>%
                                                        unnest(DevType)

df1 <- survey_results_public2 %>%
       filter(!is.na(OpenSource)) %>%
       group_by(name = OpenSource, drilldown = tolower(OpenSource)) %>% 
       summarise(y = n()) %>% arrange(desc(y))

df2 <-survey_results_public2 %>% filter(!is.na(OpenSource)) %>% filter(!is.na(DevType)) %>% group_by(OpenSource,DevType) %>% dplyr::mutate(y = n(),colorByPoint =  1) %>%arrange(desc(y))%>%
  group_by(name = OpenSource, id = tolower(OpenSource),colorByPoint) %>% 
  do(data = list_parse(
                  mutate(.,name = DevType, drilldown = tolower(paste(OpenSource,DevType,sep=": "))) %>% 
                      group_by(name,drilldown) %>% 
                        summarise(y=n())%>% dplyr::select(name, y, drilldown)   %>%
                            arrange(desc(y))) 
    )
    
highchart() %>% 
  hc_chart(type = "column") %>%
  hc_title(text = 'Who Contributed to opensource') %>%
  hc_add_series(data = df1, name = "Is Contributed to opensource",colorByPoint =  1) %>% 
  hc_legend(enabled = FALSE) %>%
  hc_xAxis(type = "category") %>% 
  hc_drilldown(
    allowPointDrilldown = TRUE,
    series =list_parse(df2)
  ) %>% hc_add_theme(hc_theme_google())

Insight

  • Note : It’s a Drill down graph.
  • This graph illustrates the detailed analysis of Open Source contributors.
  • At upper level the graph illustrates whether the respondent answered Yes or No for Open Source project contribution.
  • At deeper level for each answer choice, the graph illustrates the number of respondents belonging to each developer type.
  • Yes : 135508
  • No : 147354
  • Top 3 developer types with highest respondents choosing “Yes” are:
    1. Back-end developer : 24717
    2. Full-stack developer : 15949
    3. Front-end developer : 20829
  • Top 3 developer types with highest respondents choosing “No” are:
    1. Back-end developer : 28583
    2. Full-stack developer : 23524
    3. Front-end developer : 18873

3.4 Coding as Hobby

survey_results_public2 <-  survey_results_public %>%    mutate(DevType = strsplit(as.character(DevType), ";"))  %>%
                                                        unnest(DevType)

df1 <- survey_results_public2 %>%
       filter(!is.na(Hobby)) %>%
       group_by(name = Hobby, drilldown = tolower(Hobby)) %>% 
       summarise(y = n()) %>% arrange(desc(y))

df2 <-survey_results_public2 %>% filter(!is.na(Hobby)) %>% filter(!is.na(DevType)) %>% group_by(Hobby,DevType) %>% dplyr::mutate(y = n(),colorByPoint =  1) %>%arrange(desc(y))%>%
  group_by(name = Hobby, id = tolower(Hobby),colorByPoint) %>% 
  do(data = list_parse(
                  mutate(.,name = DevType, drilldown = tolower(paste(Hobby,DevType,sep=": "))) %>% 
                      group_by(name,drilldown) %>% 
                        summarise(y=n())%>% dplyr::select(name, y, drilldown)   %>%
                            arrange(desc(y))) 
    )
    
highchart() %>% 
  hc_chart(type = "column") %>%
  hc_title(text = 'Who have code as Hobby') %>%
  hc_add_series(data = df1, name = "Is Coding Hobby",colorByPoint =  1) %>% 
  hc_legend(enabled = FALSE) %>%
  hc_xAxis(type = "category") %>% 
  hc_drilldown(
    allowPointDrilldown = TRUE,
    series =list_parse(df2)
  ) %>% hc_add_theme(hc_theme_google())

Insight

  • Note : It’s a Drill down graph.
  • This graph illustrates the detailed analysis of ‘Is Coding a Hobby’ respondents.
  • At upper level the graph illustrates whether the respondent answered Yes or No for ‘Is Coding a Hobby’.
  • At deeper level for each answer choice, the graph illustrates the number of respondents belonging to each developer type.
  • Yes : 235293
  • No : 46569
  • Top 3 developer types with highest respondents choosing “Yes” are:
    1. Back-end developer : 43791
    2. Full-stack developer : 36815
    3. Front-end developer : 28827
  • Top 3 developer types with highest respondents choosing “No” are:
    1. Back-end developer : 9509
    2. Full-stack developer : 7538
    3. Front-end developer : 5995

4. What is their Experience?

4.1 No. of Years Coded Vs No. of Years Coded Professionally

df1 <- survey_results_public %>% filter(!is.na(YearsCoding)) %>%
  group_by(name = YearsCoding, drilldown = tolower(YearsCoding)) %>% 
  summarise(y = n()) %>% arrange(desc(y)) %>% head(10)


df2 <-survey_results_public %>% filter(!is.na(YearsCoding)) %>% filter(!is.na(YearsCodingProf)) %>%  group_by(YearsCoding,YearsCodingProf) %>% dplyr::mutate(y = n(),colorByPoint =  1) %>%arrange(desc(y))%>%
  group_by(name = YearsCoding, id = tolower(YearsCoding),colorByPoint) %>% 
  do(data = list_parse(
                  mutate(.,name = YearsCodingProf, drilldown = tolower(paste(YearsCoding,YearsCodingProf,sep=": "))) %>% 
                      group_by(name,drilldown) %>% 
                        summarise(y=n())%>% dplyr::select(name, y, drilldown)   %>%
                            arrange(desc(y)))
    )

highchart() %>% 
  hc_chart(type = "column") %>%
  hc_title(text = 'No Of Years Coding Vs No Of Years Coded Professionally') %>%
  hc_add_series(data = df1, name = "No Of Years Coding",colorByPoint =  1) %>% 
  hc_legend(enabled = FALSE) %>%
  hc_xAxis(type = "category") %>% 
  hc_yAxis(title = list(text = "Total Response"))  %>%
  hc_drilldown(
    allowPointDrilldown = TRUE,
    series = list_parse(df2)
  ) %>% hc_add_theme(hc_theme_google())

Insight

  • Note : It’s a Drill down graph.
  • This graph illustrates the detailed analysis of ‘No. of years Coding’ and ‘No. of years Coding Professionally’ respondents.
  • At upper level the graph illustrates the number of respondents for each interval of ‘No. of years Coding’.
  • At deeper level for each interval of ‘No. of years Coding’, the graph illustrates the ‘No. of years Coding professionally’ for the repondents.
  • For example some X people have coded for 6-8 years, then after drilling down, we will find out that out of 6-8 years, for how many years they have coded professionally.
  • Top 3 categories with highest number of respondents are:
    1. 3-5 years : 23313 Professional Coding Years 1. 0-2 years : 10104 2. 3-5 years : 8126

    2. 6-8 years : 19338 : Professional Coding Years 1. 3-5 years : 7678 2. 0-2 years : 4753 3. 6-8 years : 4064

    3. 9-11 years : 12169 Professional Coding Years 1. 6-8 years : 3616 2. 3-5 years : 3356 3. 9-11 years : 2408 4. 0-2 years : 1215

  • IMPORTANT INSIGHT : It can be observed that within each time interval, there is a very small number of respondents who have answered for a higher interval of ‘Professional Coding Years’ even when they belong to the lower interval of ‘Coding Years’. These are the outliers in the data.

5. What are the Demographics?

5.1 Gender Distribution

by_Gender <-   survey_results_public %>% 
                mutate(Gender = strsplit(as.character(Gender), ";"))  %>%
                unnest(Gender) %>%
                filter(!is.na(Gender)) %>%
                group_by(Gender) %>%
                summarise(n = n()) %>%
                mutate(percentage = round((n / sum(n))*100))

highchart() %>% 
 hc_chart(type = "pie") %>% 
 hc_title(text = "Gender Distribution") %>%
 hc_add_series_labels_values(labels = by_Gender$Gender, values = by_Gender$percentage) %>% hc_add_theme(hc_theme_google())

Insight

  • This graph illustrates the Gender Distribution of Respondents.
    1. Male : 92%
    2. Female : 7%
    3. Non Binary : 1%
    4. Transgender : 1%
  • It can be inferred that the major developers are Male.

5.2 Years of Experience by Gender

by_yearOfExp_gender <- survey_results_public %>% select(YearsCoding, Gender) %>%
                              filter(!is.na(YearsCoding)) %>%
                              mutate(YearsCodingNum = parse_number(YearsCoding),
                                     Gender = str_split(Gender, pattern = ";")) %>%
                              unnest(Gender) %>%
                              mutate(Gender = case_when(str_detect(Gender, "Non-binary") ~ "Non-binary",
                                                        TRUE ~ Gender)) %>%
                              group_by(YearsCodingNum, Gender) %>%
                              summarise(n = n()) %>% 
                              filter(Gender %in% c("Male", "Female", "Non-binary"))

hchart(by_yearOfExp_gender, "line", hcaes(x = YearsCodingNum, y = n, group = Gender))  %>%
hc_title(text = 'Gender Vs Years of Experience') %>%
hc_xAxis(title = list(text = "Years of Experience")) %>% 
hc_yAxis(title = list(text = "No Of Male/Female")) %>% hc_add_theme(hc_theme_google())

Insight

  • This graph illustrates the number of years of experience vs the number of respondents belonging to a gender.
  • Years of Experience with highest number of respondents : 3 Years
    1. Male 13101
    2. Female 1349
    3. Non-binary 136
  • It can also be inferred that the number of respondents with higher experience generally declines after 3 years among all the genders except for repondents with more than 30 years of experience.

5.3 Gender % by Years of Experience

by_yearOfExp_gender$Percent <- (by_yearOfExp_gender$n/sum(by_yearOfExp_gender$n)*100)

aqw <- dcast(by_yearOfExp_gender, YearsCodingNum ~ Gender)
aqw$sum <- aqw$Female + aqw$Male + aqw$`Non-binary`

highchart() %>% 
  hc_title(text = 'Gender % Vs Years of Experience') %>%
  hc_chart(type = "column") %>% 
  hc_xAxis(categories = aqw$YearsCodingNum,title = list(text = "Years of Experience")) %>% 
  hc_add_series(data = aqw$sum) %>%
  hc_add_series(name = "Male",type = "line", data = aqw$Male) %>%
  hc_add_series(name = "Female",type = "line", data = aqw$Female) %>%
  hc_add_series(name = "Non-binary",type = "line", data = aqw$`Non-binary`) %>%
  hc_yAxis(title = list(text = "% Of Male/Female/Non-binary"))  %>% hc_add_theme(hc_theme_google())

Insight

  • The line graph in the given graph is same as above but the bar graph is cumulative % of all the repondents irrespective of the gender.
  • Years of Eperience with highest number of respondents :
    • Interval : 3 Years
    • % of Respondents : 22.52%
  • It can be inferred that overall as well, the number of respondents with higher experience generally declines after 3 years among all the respondents except for repondents with more than 30 years of experience.

6. How is their connection with other Developers?

6.1 Kinship Vs Competition Vs Self-Evaluation

x <-data.frame(table(survey_results_public$AgreeDisagree1))
y <- data.frame(table(survey_results_public$AgreeDisagree2))
z <-data.frame(table(survey_results_public$AgreeDisagree3))

highchart() %>% 
  hc_title(text = 'Kinship vs Competition vs Self-Evaluation') %>%
  hc_chart(type = "column") %>% 
  hc_xAxis(categories = x$Var1,title = list(text = "Agreement Level Scale")) %>% 
  hc_add_series(name = "I feel a sense of kinship or connection to other developers",type = "line", data = x$Freq) %>%
  hc_add_series(name = "I think of myself as competing with my peers",type = "line", data = y$Freq) %>%
  hc_add_series(name = "I'm not as good at programming as most of my peers",type = "line", data = z$Freq) %>%
  hc_yAxis(title = list(text = "Total"))  %>% hc_add_theme(hc_theme_google())

Insight

  • This graph gives number of respondents who answered the three questions - ‘Sense of kinship’, ‘Competing with peers’ and ‘Not a good programmer’ according to their responses.
  • Highest number of responses for each question :
    1. ‘Kinship’ - 36777 Agree
    2. ‘Competition’ - 18673 Agree
    3. ‘Self-Evaluation’ - 23341 Disagree
  • Lowest number of reponses for each question :
    1. ‘Kinship’ - 1491 Strongly Disagree
    2. ‘Competition’ - 5329 Strongly Agree
    3. ‘Self-Evaluation’ - 2652 Strongly Agree

6.2 Kinship, Competition & Self-Evaluation by Years of Experience

by_agreeDisagree1_yearOfExp <- survey_results_public %>%
                       filter(!is.na(AgreeDisagree1)) %>%
                       mutate(YearsCodingNum = parse_number(YearsCoding)) %>%
                       group_by(AgreeDisagree1,YearsCodingNum) %>%
                       summarise(n = n()) %>% 
                       select(AgreeDisagree1, YearsCodingNum, n)

by_agreeDisagree1_yearOfExp$Percent <- (by_agreeDisagree1_yearOfExp$n/sum(by_agreeDisagree1_yearOfExp$n)*100)

p_by_agreeDisagree1_yearOfExp <- hchart(by_agreeDisagree1_yearOfExp, "line", hcaes(x = YearsCodingNum, y = Percent, group = AgreeDisagree1))  %>%
                                    hc_title(text = 'Kinship by Years of Experience') %>%
                                    hc_xAxis(title = list(text = "Years of Experience")) %>% 
                                    hc_yAxis(title = list(text = "Percentage of Agreement Level"))  %>% hc_add_theme(hc_theme_google())

by_agreeDisagree2_yearOfExp <- survey_results_public %>%
                       filter(!is.na(AgreeDisagree2)) %>%
                       mutate(YearsCodingNum = parse_number(YearsCoding)) %>%
                       group_by(AgreeDisagree2,YearsCodingNum) %>%
                       summarise(n = n()) %>% 
                       select(AgreeDisagree2, YearsCodingNum, n)

by_agreeDisagree2_yearOfExp$Percent <- (by_agreeDisagree2_yearOfExp$n/sum(by_agreeDisagree2_yearOfExp$n)*100)

p_by_agreeDisagree2_yearOfExp <- hchart(by_agreeDisagree2_yearOfExp, "line", hcaes(x = YearsCodingNum, y = Percent, group = AgreeDisagree2))  %>%
                                    hc_title(text = 'Competition by Years of Experience') %>%
                                    hc_xAxis(title = list(text = "Years of Experience")) %>% 
                                    hc_yAxis(title = list(text = "Percentage of Agreement Level"))  %>% hc_add_theme(hc_theme_google())


by_agreeDisagree3_yearOfExp <- survey_results_public %>%
                       filter(!is.na(AgreeDisagree3)) %>%
                       mutate(YearsCodingNum = parse_number(YearsCoding)) %>%
                       group_by(AgreeDisagree3,YearsCodingNum) %>%
                       summarise(n = n()) %>% 
                       select(AgreeDisagree3, YearsCodingNum, n)

by_agreeDisagree3_yearOfExp$Percent <- (by_agreeDisagree3_yearOfExp$n/sum(by_agreeDisagree3_yearOfExp$n)*100)

p_by_agreeDisagree3_yearOfExp <- hchart(by_agreeDisagree3_yearOfExp, "line", hcaes(x = YearsCodingNum, y = Percent, group = AgreeDisagree3))  %>%
                                    hc_title(text = 'Self-Evaluation by Years of Experience') %>%
                                    hc_xAxis(title = list(text = "Years of Experience")) %>% 
                                    hc_yAxis(title = list(text = "Percentage of Agreement Level"))  %>% hc_add_theme(hc_theme_google())
                                    
lst <- list(
  p_by_agreeDisagree1_yearOfExp,
  p_by_agreeDisagree2_yearOfExp,
  p_by_agreeDisagree3_yearOfExp
)

hw_grid(lst, rowheight = 400)

Insight

  • This graph gives the number of repondents who answered the three questions in the above section on the basis of their Years of Experience.
  • It can be observed that the highest number of respondents for each question are having 3 years of experience.
  • ‘Kinship’ - has clear ‘Agree’ responses from all the categories of Years of Experience.
  • ‘Competition’ - has high ‘Agree’ reponses till 9 Years of Experience. After that, it shows higher number of ‘Disagree’ responses.
  • ‘Self-Evaluation’ - has high ‘Disagree’ reponses till 27 Years of Experience. After that, it shows higher number of ‘Strongly Disagree’ response.

B. Technology Analysis


This section deals with the analysis of the primary purpose of Stack Overflow - the technology and discussions. During the survey, the respondents were asked various questions regarding the technologies that they work on. This section delivers all the related insights gathered during the EDA which includes Programming Languages, Databases, Software Dev Platforms, Frameworks and IDEs.

6. What is the Salary paid for different Technologies?

6.1 Developer Salary by Country

by_country_salary <- survey_results_public %>% select(Country, Salary) %>% mutate(Salary=as.numeric(Salary))  %>% filter(!is.na(Country)) %>% filter(!is.na(Salary)) %>%group_by(Country) %>% summarize(AvgSalary = median(Salary, na.rm=TRUE))

data(worldgeojson, package = "highcharter")
code <- countrycode(by_country_salary$Country, 'country.name', 'iso3c')
by_country_salary$iso3 <- code
by_country_salary$AvgSalary <- round(by_country_salary$AvgSalary)

highchart() %>% 
  hc_add_series_map(worldgeojson, by_country_salary, value = "AvgSalary", joinBy = "iso3",colorByPoint =  1) %>% 
  hc_colorAxis(stops = color_stops()) %>% 
  hc_legend(enabled = TRUE) %>%  
  hc_mapNavigation(enabled = TRUE) %>%
  hc_title(text = "Avg Salary by Country")  %>%
  hc_tooltip(useHTML = TRUE, headerFormat = "",
            pointFormat = "Country: {point.Country} Median Salary: ${point.AvgSalary}") %>% hc_add_theme(hc_theme_google())

Insight

  • This graph illustrates the median salary distribution for the different countries.
  • Top 5 Highest Median Salaries:
    1. South Korea $22000000
    2. Iran $15000000
    3. Vietnam $5500000
    4. Indonesia $4500000
    5. Colombia $1661000

6.2 Motion Plot : Developer Type Vs Salary Vs Years of Experience

motion_df <- survey_results_public %>% select(DevType,YearsCodingProf,Salary) %>%
            mutate(YearsCodingNum = parse_number(YearsCodingProf),
                                    DevType = str_split(DevType, pattern = ";"),
                                    Salary = as.numeric(Salary)) %>%
                                    unnest(DevType) %>% 
                                    filter(!is.na(DevType)) %>% filter(!is.na(YearsCodingNum))

motion_df2 <- motion_df %>% filter(!is.na(Salary)) %>%
            select(DevType, YearsCodingNum, Salary) %>% 
            filter(!is.na(DevType)) %>% group_by(DevType,YearsCodingNum) %>% summarize(AvgSalary = median(Salary, na.rm=TRUE))


#motion_df$z <- motion_df$AvgSalary

data_strt2 <- motion_df2  %>% 
  mutate(x = YearsCodingNum, y = AvgSalary, z = 100)

data_strt2$color = distinctColorPalette(length(unique(motion_df2$DevType)))[as.numeric(as.factor(motion_df2$DevType))]

data_seqc2 <- motion_df %>% 
  arrange(DevType, YearsCodingNum) %>% 
  group_by(DevType) %>% 
  summarise(n=n()) %>%
  right_join(motion_df2, by="DevType") %>%
  group_by(DevType) %>%
  do(sequence = list_parse(select(., x = YearsCodingNum, y = AvgSalary, z = n)))

data2 <- left_join(data_strt2, data_seqc2)  

highchart() %>% 
  hc_add_series(data = data2, type = "bubble",
                minSize = 0, maxSize = 30, dataLabels = list(enabled = TRUE, format = "{point.DevType}")) %>% 
  hc_motion(enabled = TRUE, series = 0, labels = unique(motion_df2$YearsCodingNum),
            loop = TRUE, 
            updateInterval = 1000, magnet = list(step =  1)) %>% 
  hc_plotOptions(series = list(showInLegend = FALSE)) %>% 
  hc_xAxis(min = 0, max = 30, title = list(text = "Year Of Exp")) %>% 
  hc_yAxis(min = 500, max = 200000, title = list(text = "Median Salary (USD)")) %>% 
  hc_title(text = "Motion Plot of Devtype vs Salary vs Year of Exp")  %>%
  hc_tooltip(useHTML = TRUE, headerFormat = "", pointFormat = "{point.DevType} Year Of Exp: {point.x}y Median Salary: ${point.y} No Of Response : {point.z}") %>% hc_add_theme(hc_theme_google())

Insight

  • This is a motion plot of Devtype vs Salary vs Years of Experience.
  • Full Stack developers, Back End developers and Front End developers have the highest number of respondents for most Year of Experience intervals.
  • Engineering Managers have highest salary for most Years of Experience intervals, except for people with 27.5 years of experience.

6.3 Developer Salary : Global Vs India Vs USA

global_salary <-  survey_results_public %>% select(DevType,Salary) %>%
            mutate(DevType = str_split(DevType, pattern = ";"),
                   Salary = as.numeric(Salary)) %>%
            unnest(DevType) %>% filter(!is.na(Salary)) %>%
            select(DevType,Salary) %>% 
            filter(!is.na(DevType)) %>% group_by(DevType) %>% summarize(AvgSalary = median(Salary, na.rm=TRUE)) %>% arrange(desc(AvgSalary)) %>% head(10)
 
india_salary <-  survey_results_public %>% select(DevType,Salary,Country) %>%
            mutate(DevType = str_split(DevType, pattern = ";"),
                   Salary = as.numeric(Salary)) %>%
            unnest(DevType) %>% filter(!is.na(Salary)) %>%
            filter(Country %in% c("India")) %>% 
            select(DevType,Salary) %>% 
            filter(!is.na(DevType)) %>% group_by(DevType) %>% summarize(AvgSalary = median(Salary, na.rm=TRUE)) %>% arrange(desc(AvgSalary)) %>% head(10)
 
usa_salary <-  survey_results_public %>% select(DevType,Salary,Country) %>%
            mutate(DevType = str_split(DevType, pattern = ";"),
                   Salary = as.numeric(Salary)) %>%
            unnest(DevType) %>% filter(!is.na(Salary)) %>%
            filter(Country %in% c("United States")) %>% 
            select(DevType,Salary) %>% 
            filter(!is.na(DevType)) %>% group_by(DevType) %>% summarize(AvgSalary = median(Salary, na.rm=TRUE)) %>% arrange(desc(AvgSalary)) %>% head(10)
 
p1 <- highchart() %>%
          hc_xAxis(categories = global_salary$DevType) %>% 
          hc_add_series(name = "Median Salary $", data = global_salary$AvgSalary, colorByPoint =  1) %>% 
          hc_title(text = "Global Salary by Developer Type")  %>%
          hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

p2 <- highchart() %>%
          hc_xAxis(categories = india_salary$DevType) %>% 
          hc_add_series(name = "Median Salary $", data = india_salary$AvgSalary, colorByPoint =  1) %>% 
          hc_title(text = "India Salary by Developer Type")  %>%
          hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

p3 <- highchart() %>%
          hc_xAxis(categories = usa_salary$DevType) %>% 
          hc_add_series(name = "Median Salary $", data = usa_salary$AvgSalary, colorByPoint =  1) %>% 
          hc_title(text = "USA Salary by Developer Type")  %>%
          hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())
          
lst <- list(
  p1,
  p2,
  p3
)

hw_grid(lst, rowheight = 400)

Insight

  • Engineering Managers have the highest salaries among all the cases taken in account, closely followed by DevOps specialists & C-suite executives.
  • Engineering Manager :
    1. Global Median : $90000
    2. India Median : $600000
    3. USA Median : $134000

6.4 Motion Plot : Developer Salary & Experience by Language

motion_df <- survey_results_public %>% select(LanguageWorkedWith,YearsCodingProf,Salary) %>%
            mutate(YearsCodingNum = parse_number(YearsCodingProf),
                                    LanguageWorkedWith = str_split(LanguageWorkedWith, pattern = ";"),
                                    Salary = as.numeric(Salary)) %>%
                                    unnest(LanguageWorkedWith) %>% 
                                    filter(!is.na(LanguageWorkedWith)) %>% filter(!is.na(YearsCodingNum))

motion_df2 <- motion_df %>% filter(!is.na(Salary)) %>%
            select(LanguageWorkedWith, YearsCodingNum, Salary) %>% 
            filter(!is.na(LanguageWorkedWith)) %>% group_by(LanguageWorkedWith,YearsCodingNum) %>% summarize(AvgSalary = median(Salary, na.rm=TRUE))


#motion_df$z <- motion_df$AvgSalary

data_strt2 <- motion_df2  %>% 
  mutate(x = YearsCodingNum, y = AvgSalary, z = 100)

data_strt2$color = distinctColorPalette(length(unique(motion_df2$LanguageWorkedWith)))[as.numeric(as.factor(motion_df2$LanguageWorkedWith))]

data_seqc2 <- motion_df %>% 
  arrange(LanguageWorkedWith, YearsCodingNum) %>% 
  group_by(LanguageWorkedWith) %>% 
  summarise(n=n()) %>%
  right_join(motion_df2, by="LanguageWorkedWith") %>%
  group_by(LanguageWorkedWith) %>%
  do(sequence = list_parse(select(., x = YearsCodingNum, y = AvgSalary, z = n)))

data2 <- left_join(data_strt2, data_seqc2)  

highchart() %>% 
  hc_add_series(data = data2, type = "bubble",
                minSize = 0, maxSize = 30, dataLabels = list(enabled = TRUE, format = "{point.LanguageWorkedWith}")) %>% 
  hc_motion(enabled = TRUE, series = 0, labels = unique(motion_df2$YearsCodingNum),
            loop = TRUE, 
            updateInterval = 1000, magnet = list(step =  1)) %>% 
  hc_plotOptions(series = list(showInLegend = FALSE)) %>% 
  hc_xAxis(min = 0, max = 30, title = list(text = "Year Of Exp")) %>% 
  hc_yAxis(min = 500, max = 200000, title = list(text = "Median Salary (USD)")) %>% 
  hc_title(text = "Motion Plot of Programming Language vs Salary vs Year of Exp")  %>%
  hc_tooltip(useHTML = TRUE, headerFormat = "", pointFormat = "{point.LanguageWorkedWith} Year Of Exp: {point.x}y Median Salary: ${point.y} No Of Response : {point.z}") %>% hc_add_theme(hc_theme_google())

Insight

  • It is observed that Less Popular programming languages like Ocaml, Hack, Haskel, Julia etc tend to have higher Median Salaries with increasing work experience.
  • The more common programming languages like SQL, C#, JavaScript etc. have moderate levels of salaries with discrepancies ranging from $30000 to $70000.

6.5 Developer Salary by Gender

by_salary_gender <- survey_results_public %>% select(Gender,Salary,YearsCodingProf) %>%
                        mutate(YearsCodingNum = parse_number(YearsCodingProf),
                                    Gender = str_split(Gender, pattern = ";"),
                                    Salary = as.numeric(Salary)) %>%
                                    unnest(Gender) %>% 
                                    filter(!is.na(Salary)) %>%
                                    select(Gender, YearsCodingNum, Salary) %>% 
                                    filter(!is.na(Gender)) %>% group_by(Gender,YearsCodingNum) %>% 
                                    summarize(AvgSalary = median(Salary, na.rm=TRUE))
                                    
hchart(by_salary_gender, "line", hcaes(x = YearsCodingNum, y = AvgSalary, group = Gender)) %>%
hc_xAxis(min = 0, max = 30, title = list(text = "Year Of Exp")) %>% 
  hc_yAxis(min = 500, max = 200000, title = list(text = "Median Salary (USD)"))  %>% hc_add_theme(hc_theme_google())

Insight

  • For 0-6 years, Females have higher median salaries. However the number of females is lower.
  • For 9-30 years, Transgenders & Non Binary genders have higher salaries with a single instance of Males having high salary (27 years).
  • A general trend of increase in salary is seen for all the genders with increase in experience but the number of respondents earning high salaries is lower.
  • Also, since the number of Females, Transgenders & Non-Binary gender respondents is low (accounts for 10% of total gender distribution) so the median values are not a reliable source as they are skewed. A much more stable and gradually increasing graph can be seen for Males due to abundance of data.

6.6 Developer Type vs Median Salary by Year of Exp

by_salary_devtype <- survey_results_public %>% select(DevType,Salary,YearsCodingProf) %>%
                        mutate(YearsCodingNum = parse_number(YearsCodingProf),
                                    DevType = str_split(DevType, pattern = ";"),
                                    Salary = as.numeric(Salary)) %>%
                                    unnest(DevType) %>% 
                                    filter(!is.na(Salary)) %>%
                                    select(DevType, YearsCodingNum, Salary) %>% 
                                    filter(!is.na(DevType)) %>% group_by(DevType,YearsCodingNum) %>% 
                                    summarize(AvgSalary = median(Salary, na.rm=TRUE))
                                    
hchart(by_salary_devtype, "spline", hcaes(x = YearsCodingNum, y = AvgSalary, group = DevType)) %>%
hc_xAxis(min = 0, max = 30, title = list(text = "Year Of Exp")) %>% 
hc_yAxis(min = 500, max = 180000, title = list(text = "Median Salary (USD)"))  %>%
hc_legend(align = "left", layout = "vertical", verticalAlign = "top") %>% 
hc_tooltip(sort = TRUE, table = TRUE)  %>%
hc_title(text = "Developer Type vs Median Salary by Year of Exp")  %>% 
hc_add_theme(hc_theme_google())

6.7 Programming Language vs Median Salary by Year of Exp

by_salary_LanguageWorkedWith <- survey_results_public %>% select(LanguageWorkedWith,Salary,YearsCodingProf) %>%
                        mutate(YearsCodingNum = parse_number(YearsCodingProf),
                                    LanguageWorkedWith = str_split(LanguageWorkedWith, pattern = ";"),
                                    Salary = as.numeric(Salary)) %>%
                                    unnest(LanguageWorkedWith) %>% 
                                    filter(!is.na(Salary)) %>%
                                    select(LanguageWorkedWith, YearsCodingNum, Salary) %>% 
                                    filter(!is.na(LanguageWorkedWith)) %>% group_by(LanguageWorkedWith,YearsCodingNum) %>% 
                                    summarize(AvgSalary = median(Salary, na.rm=TRUE))
                                    
hchart(by_salary_LanguageWorkedWith, "spline", hcaes(x = YearsCodingNum, y = AvgSalary, group = LanguageWorkedWith)) %>%
hc_xAxis(min = 0, max = 30, title = list(text = "Year Of Exp")) %>% 
hc_yAxis(min = 500, max = 180000, title = list(text = "Median Salary (USD)"))  %>%
hc_legend(align = "left", layout = "vertical", verticalAlign = "top") %>% 
hc_tooltip(sort = TRUE, table = TRUE)  %>%
hc_title(text = "Programming Language vs Median Salary by Year of Exp")  %>% 
hc_add_theme(hc_theme_google())

7. What do developers think about AI Technology?

7.1 Most Dangerous Aspect of Increasingly Advanced AI Technology

by_AIDangerous <- survey_results_public %>%
                        filter(!is.na(AIDangerous)) %>%
                        group_by(AIDangerous) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(AIDangerous = reorder(AIDangerous,Total)) %>%
                        mutate(Percent = round(Total/sum(Total)*100))

          highchart() %>%
                      hc_xAxis(categories = by_AIDangerous$AIDangerous) %>% 
                      hc_add_series(name = "Percent %", data = by_AIDangerous$Percent, colorByPoint =  1) %>% 
                      hc_title(text = "Most Dangerous Aspect of Increasingly Advanced AI Technology")  %>%
                      hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

7.2 Most Dangerous Aspect by Developer Type

survey_results_public2 <-  survey_results_public %>%    mutate(DevType = strsplit(as.character(DevType), ";"))  %>%
                                                        unnest(DevType)

df1 <- survey_results_public2 %>%
       filter(!is.na(DevType)) %>%
       group_by(name = DevType, drilldown = tolower(DevType)) %>% 
       summarise(y = n()) %>% arrange(desc(y))

df2 <-survey_results_public2 %>% filter(!is.na(DevType)) %>% filter(!is.na(AIDangerous)) %>% group_by(DevType,AIDangerous) %>% dplyr::mutate(y = n(),colorByPoint =  1) %>%arrange(desc(y))%>%
  group_by(name = DevType, id = tolower(DevType),colorByPoint) %>% 
  do(data = list_parse(
                  mutate(.,name = AIDangerous, drilldown = tolower(paste(DevType,AIDangerous,sep=": "))) %>% 
                      group_by(name,drilldown) %>% 
                        summarise(y=n())%>% dplyr::select(name, y, drilldown)   %>%
                            arrange(desc(y))) 
    )
    
highchart() %>% 
  hc_chart(type = "column") %>%
  hc_title(text = 'Developer wise Opinion on Dangerous Aspect of AI') %>%
  hc_add_series(data = df1, name = "Developer Type",colorByPoint =  1) %>% 
  hc_legend(enabled = FALSE) %>%
  hc_xAxis(type = "category") %>% 
  hc_drilldown(
    allowPointDrilldown = TRUE,
    series =list_parse(df2)
  ) %>% hc_add_theme(hc_theme_google())

7.3 Most Exciting Aspect of Increasingly Advanced AI Technology

by_AIInteresting <- survey_results_public %>%
                        filter(!is.na(AIInteresting)) %>%
                        group_by(AIInteresting) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(AIInteresting = reorder(AIInteresting,Total)) %>%
                        mutate(Percent = round(Total/sum(Total)*100))

highchart() %>%
  hc_xAxis(categories = by_AIInteresting$AIInteresting) %>% 
  hc_add_series(name = "Percent %", data = by_AIDangerous$Percent, colorByPoint =  1) %>% 
  hc_title(text = "Most Exciting Aspect of Increasingly Advanced AI Technology")  %>%
  hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

7.4 Most Exciting Aspect by Developer Type

survey_results_public2 <-  survey_results_public %>%    mutate(DevType = strsplit(as.character(DevType), ";"))  %>%
                                                        unnest(DevType)

df1 <- survey_results_public2 %>%
       filter(!is.na(DevType)) %>%
       group_by(name = DevType, drilldown = tolower(DevType)) %>% 
       summarise(y = n()) %>% arrange(desc(y))

df2 <-survey_results_public2 %>% filter(!is.na(DevType)) %>% filter(!is.na(AIInteresting)) %>% group_by(DevType,AIInteresting) %>% dplyr::mutate(y = n(),colorByPoint =  1) %>%arrange(desc(y))%>%
  group_by(name = DevType, id = tolower(DevType),colorByPoint) %>% 
  do(data = list_parse(
                  mutate(.,name = AIInteresting, drilldown = tolower(paste(DevType,AIInteresting,sep=": "))) %>% 
                      group_by(name,drilldown) %>% 
                        summarise(y=n())%>% dplyr::select(name, y, drilldown)   %>%
                            arrange(desc(y))) 
    )
    
highchart() %>% 
  hc_chart(type = "column") %>%
  hc_title(text = 'Developer wise Opinion on Exciting Aspect of AI') %>%
  hc_add_series(data = df1, name = "Developer Type",colorByPoint =  1) %>% 
  hc_legend(enabled = FALSE) %>%
  hc_xAxis(type = "category") %>% 
  hc_drilldown(
    allowPointDrilldown = TRUE,
    series =list_parse(df2)
  ) %>% hc_add_theme(hc_theme_google())

7.5 Ramifications of Increasingly Advanced AI Technology

by_AIResponsible <- survey_results_public %>%
                        filter(!is.na(AIResponsible)) %>%
                        group_by(AIResponsible) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(AIResponsible = reorder(AIResponsible,Total)) %>%
                        mutate(Percent = round(Total/sum(Total)*100))

highchart() %>%
  hc_xAxis(categories = by_AIResponsible$AIResponsible) %>% 
  hc_add_series(name = "Percent %", data = by_AIResponsible$Percent, colorByPoint =  1) %>% 
  hc_title(text = "Ramifications of Increasingly Advanced AI Technology")  %>%
  hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

7.6 Take on the Future of Artificial Intelligence

by_AIFuture <- survey_results_public %>%
                        filter(!is.na(AIFuture)) %>%
                        group_by(AIFuture) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(AIFuture = reorder(AIFuture,Total)) %>%
                        mutate(Percent = round(Total/sum(Total)*100))

highchart() %>%
  hc_xAxis(categories = by_AIFuture$AIFuture) %>% 
  hc_add_series(name = "Percent %", data = by_AIFuture$Percent, colorByPoint =  1) %>% 
  hc_title(text = "Take on the Future of Artificial Intelligence")  %>%
  hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

C. Employment Analysis


1. What is the Employment status

1.1 Employment status

by_Employment <- survey_results_public %>%
                        filter(!is.na(Employment)) %>%
                        group_by(Employment) %>%
                        summarise(Total = n())  %>%
                        arrange(desc(Total)) %>%
                        ungroup() %>%
                        mutate(Employment = reorder(Employment,Total)) %>%
                        mutate(Percent = (Total/sum(Total)*100)) %>%
                        head(10)

highchart() %>%
  hc_xAxis(categories = by_Employment$Employment) %>% 
  hc_add_series(name = "Percent %", data = by_Employment$Percent, colorByPoint =  1) %>% 
  hc_title(text = "Employment Status")  %>%
  hc_chart(type = "bar", options3d = list(enabled = TRUE, beta = 1, alpha = 1)) %>% hc_add_theme(hc_theme_google())

1.2 Employment Status By Country

df1 <- survey_results_public %>% filter(!is.na(Employment)) %>%
  group_by(name = Employment, drilldown = tolower(Employment)) %>% 
  summarise(y = n()) %>% arrange(desc(y)) %>% head(10)


df2 <-survey_results_public %>% filter(!is.na(Employment)) %>% filter(!is.na(Country)) %>%  group_by(Employment,Country) %>% dplyr::mutate(y = n(),colorByPoint =  1) %>%arrange(desc(y))%>%
  group_by(name = Employment, id = tolower(Employment),colorByPoint) %>% 
  do(data = list_parse(
                  mutate(.,name = Country, drilldown = tolower(paste(Employment,Country,sep=": "))) %>% 
                      group_by(name,drilldown) %>% 
                        summarise(y=n())%>% dplyr::select(name, y, drilldown)   %>%
                            arrange(desc(y))) %>% head(10)
    )

highchart() %>% 
  hc_chart(type = "column") %>%
  hc_title(text = 'Employment Status By Country') %>%
  hc_add_series(data = df1, name = "Employment Status",colorByPoint =  1) %>% 
  hc_legend(enabled = FALSE) %>%
  hc_xAxis(type = "category") %>% 
  hc_yAxis(title = list(text = "Total Response"))  %>%
  hc_drilldown(
    allowPointDrilldown = TRUE,
    series = list_parse(df2)
  ) %>% hc_add_theme(hc_theme_google())

More Insights Incoming. Stay Tuned & Upvote!

Constructive criticism is welcome. If there are any suggestions or changes you would like to see in the Kernel please let us know.